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Abstract —Nonnegative Matrix Factorization (NMF) aims 
to factorize a matrix into two optimized nonnegative matrices 
to and has been widely used for unsupervised learning tasks 
such as product recommendation based on a rating matrix. 
However, although networks between nodes with the same 
(N nature exist, standard NMF overlooks them, e.g., the social 
5^ network between users. This problem leads to comparatively 
O |l ow recommendation accuracy because these networks are also 
reflections of the nature of the nodes, such as the preferences 
of users in a social network. Also, social networks, as complex 
networks, have many different structures. Each structure is a 
composition of links between nodes and reflects the nature of 
nodes, so retaining the different network structures will lead 
to differences in recommendation performance. To investigate 
. the impact of these network structures on the factorization, 
^ this paper proposes four multi-level network factorization 
I ^ . algorithms based on the standard NMF, which Integrates the 
vertical network (e.g., rating matrix) with the structures of 
I horizontal network (e.g., user social network). These algo- 
^ rithms are carefully designed with corresponding convergence 
VQ proofs to retain four desired network structures. Experiments 
I on synthetic data show that the proposed algorithms are able 
to preserve the desired network structures as designed. Exper- 
iments on real-world data show that considering the horizontal 
networks improves the accuracy of document clustering and 
• recommendation with standard NMF, and various structures 
\l show their differences in performance on these two tasks. 

These results can be directly used in document clustering and 
in recommendation systems. 

• • Index Terms —Multi-level Network, Nonnegative Matrix 
^ Factorization, Complex Network 

• i-H 

X 

H I. Introduction 

C3 

ULTI-LEVEL network is a structure that is com¬ 
posed of vertically connected nodes with different 
characters and horizontally connected nodes with the same 
characters. It is an abstract structure, as shown in Eig. [T] 
that can be used to model the data from many areas. One 
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example, in the text mining area, is author-paper-keyword 
mapping relations (vertical network) with authors’ social 
network, paper citation network and keyword co-occurrence 
relation network (horizontal networks), as shown in Eig. 

In recommender systems, a multi-level network is com¬ 
posed of tag-user-movie mapping relations (vertical net¬ 
work) with tag similarity network, user social network 
and movie similarity network (horizontal networks), as 
shown in Eig. The multi-level network structure is very 
common, especially in a big data environment. Due to 
the Variety property of big data Q, multiple sources and 
multiple attributes appear to describe the same data. To 
comprehensively model this kind of data, all the sources 
and attributes need to be considered simultaneously and 
so a multi-level network is a good choice. However, it is 
commonly believed that multi-level networks are sparse and 
redundant, because the scale of the nodes is normally much 
larger than the scale of the links. 

Considering the sparsity and redundancy of the multi¬ 
level network, it would be helpful if an optimized latent 
space for nodes could be found according to its structure. 
This task is called factorization. Take text mining as an 
example. Suppose we have a multi-level network structure 
with 1000 authors, 10000 papers, and 8000 keywords. If we 
find a 100-dimensional space and map all the nodes into 
this space, we can represent each node, i.e., an author, only 
by a 100-dimensional vector. The term optimized means 
that the multi-level network structure can be reconstructed 
by the new representations of nodes with minimum error, 
such as reconstructing a link between an author and a paper 
by the cosine similarity between two corresponding 100- 
dimensional vectors (new representations) for the author 
and paper. Under this constraint, it is believed that this 
new representation is not only more concise but also more 
intrinsic, because the original multi-level network structure 
as unchanged as possible (some redundant information is 
removed but the significant information is kept) 0, 0. 
Many real-world applications can benefit from this new 
concise and intrinsic representation; for example, 

• Document Clustering; after the factorization of the 
author-paper-keyword structure, we can cluster papers 
with similar content together, and then help to organize 
and retrieve the papers or webpages; 

• Recommendation: after the factorization of the tag- 
user-movie structure, we can recommend a movie to 
a user by the similarity between this movie and user 
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Fig. 1: Multi-level network 




Fig. 2: An instance in Text mining Fig. 3: An instance in Recommender Systems 


using their concise representations, or recommend a 
tag to a user in the same fashion. 

Motivated by these real-world applications, we want to de¬ 
sign algorithms in this study that will dehne and implement 
multi-level network factorization. 

Existing factorization algorithms cannot complete this 
task, because they overlook the horizontal network struc¬ 
tures. The most classical algorithm is Nonnegative Matrix 
Factorization (NMF) Q. NMF is an elegant tool for fac¬ 
torizing a matrix into two nonnegative matrices, which has 
been widely used in visual tracking Q, maximum margin 
classihcation Q, face recognition Q and probabilistic 
clustering 0. Two notable inference algorithms to resolve 
NMF are Alternative Feast Square Q and Multiplicative 
Iterative algorithms 0. Here, we want to interpret NMF on 
a matrix M as a bipartite network factorization so that we 
can conduct our multi-level network factorization based on 
it. A matrix M can be seen as a bipartite network (vertical 
network) with rows as one kind of nodes and columns as 
another kind of nodes, while each element of matrix rriij 
denotes a link between two nodes with different characters. 
The aim of NMF on a matrix is to hnd an optimized 
and relatively small latent space according to the bipartite 
network structure hidden in this matrix. However, the 
standard NMF cannot be directly used for the multi-level 
network because it only considers the vertical network. 
In fact, the difficulty of multi-level network factorization 
mainly lies in how to jointly conduct the vertical network 
factorization and horizontal network factorization. 

In this paper, we propose four multi-level network fac¬ 
torization algorithms for preserving four different network 
structures of the horizonal network, including the whole 
network structure, community structure, degree distribution 
structure and max spanning tree structure. These structures 
are expressed by different link compositions which are 
sourced from the different natures of nodes. For example, 
if a user has a range of friends in a social network, it 
shows that he/she tends to have a range of preferences 
in movies. In this example, the friend relations of a user 
express the user’s nature (preferences in movies). A natural 
problem is what is the difference between these network 
structures in the impact of the recommendation or cluster¬ 


ing performance. To evaluate these differences, we conduct 
experiments to compare the performance of each on two 
real-world tasks. As the experimental results show, we can 
achieve better performance by preserving the structures 
instead of disregarding the horizontal network. We also 
compare the different performance of retaining different 
network structures on these tasks. Four cost functions are 
carefully designed to preserve the desired network struc¬ 
tures. To optimize each cost function, the corresponding 
update equations are introduced with convergence analysis. 
Experiments on synthetic data show the designed algo¬ 
rithms have the ability to preserve the desired network 
structures. To show the usefulness of these algorithms, two 
real-world tasks, document clustering and recommendation, 
have been carried out. The results show that the proposed 
algorithms perform better than traditional NMF in achiev¬ 
ing the clustering and recommendation accuracy. 

The contributions of this paper are: 

1) A general multi-level network which can jointly ex¬ 
press the relations between nodes that have the same or 
different natures by the vertical network and horizontal 
networks is proposed to model different kinds of data; 

2) Based on the nonnegative matrix factorization, four 
multi-level network factorization algorithms with their 
convergence proofs are carefully designed to discover 
the latent space, with different horizontal network 
structures as constraints. 

The rest of this paper is organized as follows. Section 
II reviews related work. The multi-level network and its 
factorization are formally dehned in Section III. Our al¬ 
gorithms for multi-level network factorization and their 
convergence analysis are proposed in Section IV. Exper¬ 
iments on synthetic data and real-world data are conducted 
in Section V. Fastly, Section VI concludes the study and 
discusses future work. 

H. Related Work 

Since our motivation is to use complex network struc¬ 
tures as constraints for nonnegative matrix factorization, 
this section is composed of two parts: 1) we will discus 
elementary introductions to, and research on, the structures 
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of a complex network, and 2) we will discus recent works 
on factorizations with networks. 


A. One-level and Multi-level Complex Network 

Complex network is an interdisciplinary research, attract¬ 
ing researchers from computer science, physics, sociology, 
biology, and so on. Due to the pervasiveness of the network 
phenomenon, complex network has been adopted to model 
many things, such as the users’ friend network and the 
cell network in the brain. Comparing to the graph, the 
complex network area focuses more on the non-trivial 
structures. The two outstanding structures are small-world 
network published in Nature eg and power-law degree 
distribution im published in Science. In fact, many other 
different network structures have also been discovered and 
dehned in this area 0, |[Tg. However, it is commonly 
accepted that the following network structures are the most 
fundamental and signiheant for describing the structure of 
a complex network; community structure ||T4), |[T5), degree 
distribution |16|- 08), and max spanning tree |19|, pO) . 
To the best of our knowledge, little work has been done to 
consider the influences of complex network structures on 
factorization. 

Recently, the multi-layer/multi-level network has at¬ 
tracted the attention of researchers. Its mathematical for¬ 


(extended to semi-nonnegative by 1251), the NMF aims 
to And two matrices Aaxk and Xkxx to minimize the 
following cost function 


J{A,X)=^\\Y-AX\\l, 


( 1 ) 


where 


the Frobenius norm and the elements 


in Eq. Q for purposes such as sparseness constraint 0. 
smooth constraint p6) , orthogonal constraint p7) , and 
label information |28| . All these constraints aim to make 
the discovered (/c-dimensional) latent space preserve more 
properties. 

The networks/relations between data have been consid¬ 
ered in a number of ways in NMF. One way is to let 
users define the must-link and cannot-link relations between 
data | [29l , and then use these relations as constraints for 
the NMF. Another way is to define a graph between data 
as the constraint of NMF (also called graph-embedding 
iig, ||3T| or graph-regularization lig, n^). As we prove 
later, this constraint only preserves the community structure 
of network. Some works have jointly considered two- 
side information during factorization. For example, the 
constraint in p4) considers user similarity network and 
post content; the constraint in p5) considers graphs from 
multiple domains. There are also works on using NMF to 
And the community structures of a network or two 


networks 1371 but not as constraints, as in this paper. These 


mulation is given in (211. Similar to the one-layer complex 
network, its structures are defined and discussed in 


Apart from formalization and structure definition, the multi¬ 
layer network has been used for modeling the influence 
propagation over microblogs and the analysis and man¬ 
agement of change propagation | [24) . However, most state- 
of-the-art research of the researches on multi-layer/multi- 
level networks focuses on basic structure analysis. There is 
no work on factorization and its applications for document 
clustering or recommender systems. 

Since the traditional network structures (i.e., community, 
degree distribution and max spanning tree) do not consider 
the directions of the edges in the network and our aim is 
to preserve these network structures after the factorization, 
we assume in this paper that the network is undirected. 

B. Network-related factorization models/algorithms 

The existing network-related models/algorithms in rec¬ 
ommender systems and document clustering are mainly 
dominated by two renowned techniques; nonnegative ma¬ 
trix factorization and topic model. 

1) Network-related Nonnegative Matrix Factorization: 
First, we give a brief introduction to nonnegative matrix 
factorization (NMF). Given a nonnegative matrix Yaxx 


works are similar to this paper, but they do not explore 
the structures of graphs. As we introduced in Section II.A, 
there are many important structures for a given network. 
However, these network structures are disregarded during 
factorization. Our contribution is that the different network 
structures are considered during factorization. 

2) Network-related Topic Models: Topic models | |38) 
were originally developed to discover the hidden topics in 
documents, which can also be seen as a kind of factoriza¬ 
tion. Recently, some extensions of the topic models have 
attempted to adapt the data to network structures. Since 
the social network and citation network are two explicit 
and commonly used networks, most topic models try to 
adapt to these two types of network. For the social network. 


the Author-Recipient-Topic (ART) model (39| has been 
proposed to analyse the categories of roles in the social 
network, based on the relations of people in the network. 
A similar task is investigated in | |40) . The social network 
is inferred from informal chat-room conversations utilizing 
the topic model ID- The ‘noisy links’ and ‘popularity bias’ 
of the social network are addressed by properly designed 
topic models in | [42| and | [43) . As an important issue of 
social network analysis, communities ^4\ are extracted 
by Social Topic Model (STM) | |45) . Mixed Membership 
Stochastic Block model is another way to learn the mixed 
membership vector (i.e., topic distribution) for each node 
of the community structure | |46) . For the citation network. 
Relational Topic Model (RTM) |47) is proposed to infer the 
topics and hierarchical topics from document networks by 
introducing a link variable between two documents. Unlike 
RTM, a block is adopted to model the link between two 
documents 1481. To retain the document network, Markov 


Random Field (MRF) is combined with topic model |49|. 


of A and X are also nonnegative. In the literature, con¬ 
straints are added to the A or X in the cost function 


The communities in the citation network are also investi¬ 
gated po) . 

Some models have also considered the hierarchical 
structure. Original topic model only consider two levels; 
document level and keyword level. Since there are many 
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features associated with documents, each feature can be 
seen as the third level in the hierarchical structure, including 
conference 0, time |[5^ , author plj , entity p4) , emotion 
| [55| and other labels |56[ . Through these specified models, 
not only can the topic distribution of each document be 
discovered, but the topic distribution of these features 
can also be learned. However, all these works disregard 
the networks. At each level, the items are considered as 
independent with each other. 

To summarize, many researchers have noticed the im¬ 
portance of network structure, and it has been considered 
in a variety of models. From their work, we can see that 
the network structure indeed can help to unveil the nature 
of data. However, other complex network structures (i.e., 
degree distribution and max spanning tree) are overlooked. 

III. Problem Definition 

In this section, we will formally define and explain 
multi-level network and its factorization, as well as the 
problem encountered in multi-level network factorization. 
The designed algorithms to resolve this problem will be 
given in the subsequent section. 

A. Multi-level Network 

Definition 1 (Multi-level Network, flj: Multi-level Net¬ 
work is composed of nodes with different characters. It 
includes horizontal networks between nodes with the same 
character and vertical networks between nodes with a 
different character, as shown in Fig. 

This is an abstract and very common model that can be 
used to model data from different areas. For example, 

• in recommender systems, there is a multi-level net¬ 
work structure: tag-user-movie, in which users may 
have trust relations with each other, tags may have 
correlation relations with each other and movies may 
have similarity relations due to their genre informa¬ 
tion; 

• in the text mining area, there is a multi-level network 
structure: author-paper-keyword, in which authors may 
have cooperation relations with each other, papers may 
have citation relations with each other and keywords 
may have semantic relations with each other. 

B. Horizontal-network Factorization 

Given a horizontal network Hnxn in and the element 
hni,nj = w denotes that there is an edge between node rii 
and node rij with weight w. Against different backgrounds, 
the meanings of these edges will be different. For example, 
in recommender systems, there could be a user social 
network which expresses the trust relations between users; 
in the text mining area, there could be a paper citation 
network which expresses the citation relations between 
papers. 

Definition 2 (Horizontal Network Factorization): 
Horizontal network factorization projects the nodes to 
a lower dimension space and, at the same time, keeps 


the original horizontal network structure as unchanged as 
possible. Based on the NMF, it can be expressed as 

J{A)=\\\H-AA^\\%, (2) 

where A„xfc is a nonnegative matrix with each row corre¬ 
sponding to a node in the network Hnxn- We can see that 
the nodes in Hnxn are all projected to a fc-dimensional 
space as A. The cost function in Eq. tries as far as 
possible to retain the network structure of H. 

C. Vertical-network Factorization 

Given a vertical network Vnxp, the element de¬ 

notes that there is an edge between node rii and node 
Pi. Against different backgrounds, the meanings of these 
edges will also be different. For example, in recommender 
systems, there could be user-movie relations due to user 
ratings on movies; in the text mining area, there could be 
author-paper relations as a result of authors writing papers. 

Definition 3 (Vertical Network Factorization): The ver¬ 
tical network factorization also projects the nodes to a lower 
dimension space and, at the same time, keeps the original 
vertical network structure as unchanged as possible. Based 
on the NMF, it can be expressed as 

J{A,X)=^\\V-AX\\l, (3) 

where Anxk is a nonnegative matrix with rows correspond¬ 
ing to one kind of node in the network Vnxp and Xkxp 
corresponding to the other kind of node in Vnxp- We can 
see that nodes with different characters in Vnxp are all 
projected to a fc-dimensional space. The cost function in Eq. 
Q tries as far as possible to retain the network structure 
of V. 

Note that the nonnegativity condition of A and X is 
necessary, because the underlying components A and X 
have their own physical interpretations. In the recommender 
system, A denotes the users’ interests and X denotes the 
movies’ properties. 

D. Multi-level Network Factorization 

The friend relations between users will apparently impact 
on users’ ratings on movies, and the citation relations 
between papers will impact on the keyword usage of each 
paper. Thus, we need a way to combine them. 

Definition 4 (Multi-level Network Factorization): 
Multi-level network factorization is the combination of 
Horizontal-network factorization and Vertical-network 
factorization, and the final latent fc-dimensional space is 
shared by two factorizations. Based on the NME, it can be 
expressed as 

JiA,X) =i||l/ - AX\\l + a ■ ^\\H - AA^WI, (4) 

where a is the parameter used to adjust the weights of two 
parts in the cost function. 

It should be noted that H only represents one horizontal 
network. However, multi-level factorization can easily be 
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achieved by adding more items on the right side of Eq. 
0. Here, we only discuss one level horizontal network for 
the sake of simplicity, but a comprehensive analysis of the 
situations with a different number of levels will be tested 
in the experiment section. 

One problem when conducting multi-level network fac¬ 
torization is how to preserve the network structures, such as 
community structure, degree distribution and max spanning 
tree, after the projection to the new and relatively small k- 
dimensional latent space. Note that the minimization of Eq. 
0 cannot ensure the different structures will definitely be 
preserved, as will be demonstrated in the experiments in 
Section V. 

IV. Preserving Complex Network Structures 

DURING Multi-level Network Eactorization 

In this section, we will introduce four algorithms for 
preserving four different network structures during multi¬ 
level network factorization. There are two levels of nodes 
with vertical network, Vnxp, but only one level horizontal 
network, Hnxn, will be considered for brevity. 


A. NNMF: Preserve the Whole Network Structure 


In this situation, all edges in the network have the same 
status. The cost function is the same as Eq. 0. Since Eq. 
0 contains second-order of matrix A, an approximation 
with lower computation complexity can be made by 

P = argminp>o|i||7T-PP'^||^| (5) 

and 

J{A,X) = ^\\V- + a^WP - Afp. (6) 

The hrst equation is a symmetric NMF for network 
H. We can then obtain an optimized latent space which 
preserves the whole network structure and the new repre¬ 
sentation, P, of nodes by this latent space. 

The derivatives of Eq. 0 with respect to A is 

^ = ^.VX^ + AXX^)+a{A-P) 

= A{XX'^ + a-I)- (VX'^ + aP) 
and according to KKT condition 

[( - + AXX^) + a{A - = 0, (8) 

the update equation is set as 




A* 


[VX^ + aP+]^^ 

\\ [AXX'^ + aA + aP-]^ 


( 9 ) 


where P+ = {\P,,\ + Py)/2, P" = (|P„| - P,,)/2 and 
P = P+ — P~. Eor X, the derivative is 

% = -A'^V + A^AX. (10) 

oX 

According to KKT condition 

[-A^V + A^AX]^^X,^=0, (11) 


the update equation for X is 


Xij ^ ^ij 




1/2 


( 12 ) 


Since the update equation of X is the same as traditional 
NMF, we only give the convergence proof of A. 

Proof: According to Eq. give the objective func¬ 
tion with respect to A as 


P(A) = - VX^A^ + ^AXX^A^ + ■ AA^ 

- a ■ P+A^ P a ■ P-A^^ 


( 13 ) 


and dehne 


A,: 


G(A, A) = Y,-{VX^AA\^{1 + log 


^ {AXX^AAl ^ 

L -TAT- 






Ail 


- Y Ay (1 + log -^) 


(14) 






Then, G(A, A‘) is the auxiliary function of F{A), because 
the conditions 

P(A‘) = G(A‘,A‘) 

G(A‘,A‘) > G(A‘+i,A‘) (15) 

G(A‘+\A‘) > P(A‘+i) 


are satisfied when A‘+^ takes the minimum value of 
G(A, A*) with respect to Ay. The derivative of G(A, A*) 
is 


dG{A,A) (VX^)y-A^ {AXX^AA, 

dAij Aij Alj 

^ ATAy ^ (/-)Ay P+Aj 

^ij 

( 16 ) 

and set A‘+^ as the minimal value of G(A,A‘) through 


At+l _ At 

Aj - Aj ^ 


[VX'^ + aP+], 


[AXX^ + aA + aP-], 


(17) 


Therefore, we know that the update of A according to 
Eq. 0 will lead to the non-increasing of J{A,X). 

■ 

We can see from the update equations for A and X 
that, since we only consider one level network, the A is 
influenced by the horizontal network structure in Eq. 0. 
The update equation of X is the same as Eq. ( [T2] i, since the 
new term does not impact on the derivative of cost function 
with respect to X. 
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Algorithm 1: NNMF 


Input: H and V, Maximum Iteration number; 


Output: A and X 

1: P = aigmmpyJlWH - PP^Wp 


while i < /max do 
if 

“ dA ,, 

Aij = max{A, 


< 0 in Eq 


3q. 0 

ij, 


then 


end if 

compute by Eq. 

A ■ ■ 1 _ A . . _ * 

■^^3 ^ 'IA dAijA 

if < 0 in Eq. |l0|) then 

Xij = max{Xij, a 

end if 

compute fix by Eq. OTb; 

A, 


-tj 

i = i + l\ 

end while 


^ Ay - rix^-\ 


Eq. ( [T7| ) is equal to update A through the gradient descent 
method with a special step size @>|5Z)- The corresponding 
step size of Eq. GZl is 

PA = Aij- 

({AXX^ + aA + + {VX^ + aP+fA) 


{AXX'^ + aA + aP-) 


-xl/2 


-1 


(18) 

As pointed out in |57| , |58|, Eq. © might not converge 
to a stationary point due to improper step size. To ensure 
the convergence of the update, we revise this step size iz) 
as. 


Pa = 

[(AXX^ + aA + aP-)]/^ + {VX^ + aP+)-/") 


(AAA"^ + aA + aP-)]'j^ + <5 


where 


Ay — 


’ */ dAij — ® 

E(Ay,cr), if^<0 


(19) 


( 20 ) 


and 6 and a are two small positive numbers. 

Eor A, the corresponding step size of Eq. (12i is 

VX = ^ij ■ 

((A^AA)^/" + (A^y)^/") . (A^AA);f + 5 


( 21 ) 


The whole procedure is summarized in Algorithm [T] 


B. CNMF: Preserve Community Structure 

Community structure |T3), |37), is an important 
structural property of a complex network. Like the whole 
network structure, our idea is to project the nodes into 
a latent space which can preserve community structure 
rather than the whole network structure. Then, the problem 
becomes how to hnd a space to preserve the community 
structure. The Laplacian matrix, which is broadly used in 
spectral analysis | |60) , of the original network matrix, H, 
is considered here. One of its dehnition is. 


L = D-H 
(P, A) = svd{L) 


( 22 ) 


where D is the degree matrix dehned as dij = hij and 
dij = 0(i 7 ^ j), svd{-) is the singular value decomposition 
operation, P is the eigenvectors and A is eigenvalues. 

With this latent space, P, the cost function can be easily 
designed as 


1 


1 , 


J(A,A)=-||C-AA||^ + a.-||Pfe-A|| 


F’ 


(23) 


where P^ is the first k eigenvectors in P. Then, the update 
equations are the same as Eq. (j^ and Eq. (12i. Next, we 
try to prove the ability to preserve the community structure 
of the designed cost function. 

Proof: P’s ability to preserve the community structure 
originates from graph cut theory. In this theory, separating 
a group of nodes into k subgroups is equal to optimizing 
the following cost function 


min RatioCut{Gi, G 2 , ■■■, Gu) 
G 


|G,| 


(24) 


2=1 
-^T ] 


-m\nTr{Q^ LQ), s.t. Q^Q = I 
Q 


where G = Gi, G 2 ,..., G^ is the partition of nodes, \Gi \ is 
the number of nodes in G^, and W is a designed indicator 
matrix (more details can be found in ||60l). With a small 
relaxation, the solution Q in Eq. (24i is just the matrix 
that contains the hrst k eigenvectors of L. That means our 
Pfc can optimize the cost function in Eq. ( |24) i, and then 
can give the best partition of the network. Therefore, the 
cost function in Eq. (|23]) is able to preserve the community 
structure of the original network H. ■ 


It should be noted that the cost function in Eq. (231 has 
the same trend with 


J(A,A) =^\\V - AX\\l + a-tr{A^LA). 


(25) 


This equation directly combines the NME and Graph- 
Cut, which are commonly adopted by other researches 
131], |32) for graph-embedding or graph-regularization. The 
difference between Eq. (123]) and Eq. ( |25| ) is the same as the 
difference between Eq. and Eq. (Hir 

The whole procedure is summarized in Algorithm |2| 
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Algorithm 2: CNMF 


Input: H and V, Maximum Iteration number: I„ 


Output: A and X 


L = D-H; 

P = svdiL); 
while i < /max do 
if ^ < 0 in Eq 


dJ 

Aij = max{A, 


Iq. 0 

ij, 


then 


end if 

compute by Eq. 

A . . j _ A . . _ j-f dJ • 


dAi 

if^<0in Eq. 

Xij = max{Xij, 

end if 


10b then 


^ X^j - 


compute r]x by Eq. (21 1 ; 
X^o 

i = i + l\ 

end while 


Hr, 


I nx 1 


^^T/nxl 


\Hr, 


I nx 1 


-AA^l 


T -1 nx 1 11 2 


J{A,X) = \\\V-AX\\l 


1 

+ -«. 


\H„, 


I nx 1 


-AA^l 


T-i nx 1 1 


The derivative of A is 
dJ{A,X) 


dA 


= -VX^ + AXX^ 


I nxl-i 1Xn 


+ 2a-AA^l^^H^^’^A 


The update equation of A is set 


Aij -^ij 


{iAXX^)^j + 8aiAA'^lA),j{VX^ + aHlA),^) 


1/2 


- iAXX^)r 


{AA^ l«xn^) 


(31) 

Proof: According to Eq. ( [29] ), give the objective func¬ 
tion with respect to A as 


F{A) = tr(^ 


= tr{ - YX^A'^ 


C. DNMF: Preserve Degree Distribution Structure 

The degree distribution im, HD, i jTSt is another very 
important structure. A node, i, in a network will have a 
number of neighbors as, di. The degree distribution can 
be described by a function of Ud ~ Tt{d), where Ud 
denotes the number of nodes with the same degree d and 
7r(-) is the distribution of the number of degrees. Our 
idea to preserve the degree distribution is to maximize 
the correlation between the node degree sequences of the 
original matrix and the new generated matrix (formed by 
fc-dimensional latent space). 

Suppose the network matrix // is a binary matrix. 


-b ^AXX^A^ 

- a ■ Hl^^^AA'^ 

+ ■ AA^r^^AA^ 


(32) 


and define 


G(A,A*) 




l^nAXX^f.Al 

^ a V 




(33) 


Ar 


(26) 


a-(Hl-x-A%Al^(l +log qH) 




will be a vector containing degrees of all nodes. Similarly, 

(27) 




Q ' 






is also a vector containing degrees of all nodes in the k- 
dimensional latent space. The distance between two degree 
vectors can be evaluated by. 


(28) 


Then, G(A, A‘) is the auxiliary function of F{A), because 
Eq. ^ is satisfied. The derivative of G{A,A*) with 
respect to A is 


If we can minimize this distance, the degree distributions in 
original space and new latent space will be similar. If the H 
is not a binary matrix but a real-valued matrix, the degree 
sequence an be seen as a weighted degree distribution. 
Then, a new cost function with this term is 


dG{A,A^) 

dAu 


iVX^)r,A% 


Aij 

iA*XX^)rjAj 


(29) 


—a • 


■\-2a 


((//l"x-)A«),,-AT 

Aij 


( 34 ) 


( 30 ) 


and set through ^ = 0, and then the Eq. 

is derived. 























Algorithm 3: DNMF 


The update equation for A is 


Input: H and V, Maximum Iteration number: 
Output: A and X 
1 : while i < /max do 


if < 0 in Eq. (|30|i then 

Aij = max{Aij,a 

end if 

compute by Eq. 

A . . 2 _ A . . _^ • 

^ 'lAdAi ’ 

if < 0 in Eq. (|10|) then 

Aijj — Trrtax^^^'ij j (t 

end if 

compute fix by Eq. pTli; 

dJ 


Aij i Aij 


(^{AXX^) 


T^2 

'i'j 


+ 4a{T @ {AA'^)AUj{VX^ + aT ® {AA'^)A),j 


- (AXX^), 


2a{T®AA^A) 


1/2 


(39) 


Xij ^ Xij - rixQxr ', 

i = i + l', 

end while 


The convergence proof of this update equation is given as 
follows. 

Proof: According to Eq. 0, give the objective func¬ 
tion with respect to A 


The revised step size for the update in Eq. 0 is, 


■Ha = Ai 


(4a ■ (AA^lA^y/^ 


- ({AXX'^)l + 8a{AA^lA)ij ■ (VX'^ ® a ■ HlA)ij 


- (AXX^), 


F(A) =tr(^- VX'^A^ + ^AXX'^A^ 
+ ^a ■ {(T -T)® AA^AA^^ 


and dehne 

1/2 

G(A,A) = -Y, j'(yX%A^(l+l0g^) 


(40) 




(4a-(A A 1A)A 


■T^—, n1/2 


(-VX^ + AXX'^ - a ■ HIA + 2a ■ AA^lA) + S 


(35) 


I (AXX^ A A^A 

) 

~ 7*^^ {aiA\iA\.iA\i.A\^ I 1 -flog 


where A satishes Eq. (20 1 with A replaced by Eq. d30|. 


ijkl 


At A^ 4* A^ 


The whole procedure is summarized in Algorithm pT 

D. TNMF: Preserve Max Spanning Tree Structure 

The max spanning tree p3| , | |59) of network FI is hrst 
mined, 

T®H = T^ (36) 

where ® denotes the Hadamard (element-wise) product and 
is the mined max spanning tree of network FI and T is 
called the tree-mask matrix which is a binary matrix with 
tij = 1 if it is in otherwise ti j — 0, and T is the 
complement of T with tij = 1 if it is not in otherwise 

tij = 0. 

The cost function is 

J(A,X) = ^\\V-AX\\l 


l^j^((T®(A\Ar)A%(AA 


^ .. V J ■ 

(41) 

Then, G(A,A*) is the auxiliary function of F(A), be¬ 
cause Eq. ( [fSj l is satished. The derivative of G(A, A*) with 
respect to A is 

aG(A,A*) (VXA^JA% 


dA,, 






(T®(A\Ar)A%An 


(42) 


-f a 


(t®(A(A)Aa%aI 


(aIA 


-a-\\\T®(AAA-T®(AAA)\\l. 


(37) 


The derivative with respect to A is 


and set A^^^ as the minimal value of G(A,A‘) through 
dG{A,A ) _ Q ggj £q (29 i. Therefore, we know 


dJ(A,X) 


dA,, 


= i - VX^ + AXX'^ 


dA ^ ' . ' (38) 

-f a • (T - T) ® (AA'^)A 


that the update of A according to 
non-increasing of J(A,X). 


"Eq. (39 1 will lead to the 
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Algorithm 4: TNMF 


Input: H and V, Maximum Iteration number: 

Output: A and X 


Obtain Max spanning tree T of H\ 

while i < /max do 


if < 0 in Eq. (|38|i then 


dA 

Aij = max(Aij, 

end if 

compute by Eq. (EH); 

A . . y _ A . . _ * 

<r- JAij dAit^ 

if < 0 in Eq. |l^ then 

Xij = max{Xij, (tJ; 

end if 


compute rjx by Eq. (21 1 ; 


A,.. 


-tj 

i = i + l\ 

end while 


^ Ay - rix^-\ 


The revised step size for the update in Eq. is, 
Va = Aij 


(2a ■ (T @ iAA^)Aj - {^AXX^) 


+ 4a(T@(AA )A)^{VX'^ + aT®{AA )A)i 


1/2 


- (AAA^), 


{2a ■ (t ® 


1/2 
ij 




-VX^ + AAA-' + a • (T - T) @ {AA )A) + ,5 

V / ij 


- 

where A satisfies Eq. (20 1 with replaced by Eq. (13^. 
The whole procedure is summarized in Algorithm B] 


V. Experimental Results and Analysis 

This section is composed of two parts: experiments on 
synthetic data and experiments on real-world data. The 
first part is used to verify the correctness of our proposed 
algorithms. The second part is used to show the usefulness 
of our work. Eor the sake of brevity, abbreviations are used 
as follows: NNME for the algorithm proposed in Section 
IV(A) to preserve the whole network structure; CNME for 
the algorithm proposed in Section IV(B) to preserve the 
community structure; DNME for the algorithm proposed in 
Section IV(C) to preserve the degree distribution structure; 
TNME for the algorithm proposed in Section IV(D) to 
preserve the max spanning tree structure. 


1) Convergence analysis: Although we have given the 
proofs for the convergence of the proposed algorithms in 
Section IV, the real values of cost functions are given in this 
section to show the convergence of the algorithms under 
the designed update equations. Eirst, the matrices, V and 
//, are randomly generated. Eour algorithms are used to 
factorize them. The value after each iteration is recorded. 
The number of iterations is set as 10,000. It should be noted 
that the iterations may stop before this number is reached 
due to the stationary (smaller than le — 10) between the 
values of the cost function before and after one iteration. 
We can see from Eig.|^that the values of the cost functions 
will decrease under the designed update equations of the 
four algorithms in line with our proofs. 

2) Test for community structure preservation: Eor a 
given pair of matrices < V,H >, H denotes a network. 
Suppose we want to factorize < V,H > and preserve the 
community structure of network H at the same time. To 
show the ability of CNME to keep the community structure 
of the original network H, we compare the communities 
of the re-constructed networks (Hxnmf Hdxmf) 
with the communities of the original network H. Eor quan¬ 
tification purposes, we use clustering evaluation metrics. 
The evaluation metrics of document clustering are Jaccard 
Coefficient (JC), Eolkes&Mallows (EM) and El measure 
(El). Given a clustering result, 

• a is the number of two points that are in the same 
cluster of both benchmark and clustering results; 

• 6 is the number of two points that are in the same clus¬ 
ter of benchmark but in different clusters of clustering 
results; 

• c is the number of two points that are not in the same 
cluster of both benchmark but in the same cluster of 
clustering results. 

and three metrics are computed by equations in Table |I] 
(bigger means better). 

We randomly generate 1000 pairs of matrices < V,H > 
for each size (10 and 100). The comparison between the 
communities of re-constructed network with the communi¬ 
ties of the original network are shown in Table [I^ Erom 
these results, we can draw the conclusion that CNME 
preserves the community structure better than NNME. 

TABLE I: Evaluation metrics of document clustering/communities 


Jaccard Coefficient 

JC = X 

a + o+c 

Folkes & Mallows 

FM = 

a a 

^ a-\-b a + c 

1/2 

FI measure 

FI = 

2A 

2a‘^ +ac+ci6 


A. Experiments on Synthetic data 

In this section, we randomly construct two matrices, V 
and //, to verify the proposed algorithms, including the 
convergence of algorithms and the ability to preserve the 
desired network structures. 


3) Test for degree distribution preservation: A pair of 
matrices are randomly generated < V,H >. H is the 
original network whose degree distribution we want to 
keep. After running both NNME and DNME, we obtain 
Annmf and Adxmf which can re-construct the net¬ 
work by Hnnmf = AxnmfAxxmf Hdnmf = 
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Iteration number Iteratloin number Iteration number 


Fig. 4: The values of cost functions under the designed update equations. Three figures denote the different values of the dimension 
numbers of latent space, 10, 100 and 1000, respectively. 


TABLE II: Comparison between NNMF and CNMF on preserving community structure 


Algorithm 

n=10 (K=3) 

n=100 (K=10) 

JC 

FM 

FI 

JC 

FM 

FI 

NNMF 

0.1914 ± 0.0642 

0.3240 ± 0.0930 

0.3165 ± 0.0879 

0.1894 ± 0.0200 

0.3701 ± 0.0487 

0.3180 ±0.0282 

CNMF 

0.2542 ± 0.0667 

0.5172 ±0.0885 

0.4615 ± 0.0870 

0.3437 ± 0.0952 

0.5209 ± 0.0983 

0.5044 ±0.1022 


To show the ability of DNMF to retain 
the degree distribution of the original network H, we 
compute the correlation between the degree distributions 
of the re-constructed networks {Hjsimmf Hdjsimf) 
and the original network H. If the degree distribution of 
Hdnmf has larger correlation coefficient with the degree 
distribution of H than the degree distribution of HmnmFi 
we can draw the conclusion that DNMF preserves degree 
distribution better than HjsijqMF- Here, we consider two 
matrix sizes: 10 and 100. For each size, we randomly 
generate 1000 pairs of matrices. The number of hidden 
factors is set as 3 and 10. The average and standard 
deviation of results are given in Table These numbers 
show that the DNMF preserves more of the network degree 
distribution structure than NNMF. 

TABLE III: Comparison between NNME and DNME on preserv¬ 
ing of degree distribution structure 


Algorithm 

n = 10 (K=3) 

n = 100 (K=10) 

NNMF 

0.0658 ±0.3314 

0.0045 ± 0.0980 

DNMF 

0.5548 ± 0.2531 

0.9079 ±0.0190 


4) Test for max spanning tree preservation: For the 
max spanning tree, we first randomly generate a pair 
of Viooxioo and iLiooxioo- Using NNMF and TNMF to 
factorize V and H obtain two Amnmf and Atnmf 
which can re-construct the network H by Hj^jsimf = 
AnnmfAJ^j^j^p and Htnmf = AtnmfAppij^p. Af¬ 
ter extracting max spanning trees of three networks H, 
Hnnmf and Htnmf, we can compare the similarity 
between the re-constructed trees (Tnnmf and Ttnmf) 
and the benchmark Tp. An illustrative example is shown 
in Fig. 1^ in which each point in the matrix denotes an 
edge between two nodes. To quantify the ability of TNMF 


to retain the tree structure, we randomly generated 1000 
pairs of V and H for each size: 10 and 100. For each pair 
< V,H >, we run both NNMF and TNMF, and compute 
the similarity between the re-constructed trees with the 
benchmark tree by the number of overlap edges as 

siTp, Tnnmf) = sumiTp @ Tnnmf) (44) 

where sum{M) is to count the number of 1 in matrix M. 
The number of hidden factors are set as 3 and 10. If the 
re-constructed max spanning tree from TNMF has more 
overlap edges with the original max spanning tree than the 
tree from NNMF, we draw the conclusion that TNMF is 
able to preserve the max spanning tree better than NNMF. 
The results are shown in Table |IVl The max in the table 
denotes the maximum number of edges in the max spanning 
tree. For a network with n nodes, the maximum number of 
edges in the max spanning tree is n — 1. 

TABLE IV: Comparison between NNME and TNME on preserv¬ 
ing max spanning tree structure 


Algorithm 

n = 10 (max = 9) 

n = 100 (max = 99) 

NNMF 

2.5 ± 1.3 

21.4 ± 6.3 

TNMF 

6.2 ±0.7 

56.1 ± 7.2 


B. Experiments on Real-world Data 

The above section shows the convergence and abilities 
to preserve the desired network structures. In this section, 
we will compare the efficiency of the proposed algorithms 
for real-world applications, including document clustering 
and recommendation. 
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Real Max Spanning Tree 


Max Spanning Tree from NNMF 


Max Spanning Tree from TNMF 



Fig. 5; Comparison between NNMF and TNMF on the ability to preserve the max spanning tree. Each point in the figure denotes 
a link in the network. The left figure denotes the real max spanning tree of the original network; the centre figure denotes the max 
spanning tree from NNMF; the right figure denotes the max spanning tree from TNMF. Parts of them are highlighted as an ellipse(red). 
We can make an approximate comparison through these three ellipses. 


1) Document Clusteri^ with One-level Network: Our 
document data is Cora IM which is a public dataset and 
consists of 2708 scientific publications classified into one 
of seven classes. One-level network is composed by a 
horizontal network and a vertical network. The horizontal 
network, 772708x2708^ is the citation relations between pub¬ 
lications and it consists of 5429 links. This vertical network, 
^ 2708 x 1433 , is constructed by the mapping relation between 
documents and words. The detailed statistics are shown in 
Table |V] 


TABLE V: Statistics of Cora dataset 


Number of documents 

2,708 

Number of keywords 

1,433 

Number of links 

5,429 

Number of classes 

7 


After the factorization of 1^2708x1433 and 772708x2708 
through our proposed algorithms, NNMF, CNMF, DNMF 
and TNMF, the documents are projected into a latent space 
and then given new representations. It is believed that the 
different classes are formed as a result of the intrinsics 
of the documents, so if the discovered latent space is 
good enough, it will cause the documents to cluster into 
these seven classes. According to this idea, we conduct the 
document clustering through the learned matrix A (the new 
representations of documents) by the k-means clustering 
algorithm. To compare the efficiency, we also implement 
standard NMF which does not consider the horizontal 
network 772708x2708 and Relational Topic Model (RTM) 
EZ) which is a successful probabilistic Bayesian model for 
the 772708x2708 and V2708xi433- RTM can also be seen as a 
method for the factorization but from a probabilistic view, 
which also does not consider the network structure. 

The final results are shown in Fig. Three subfigures 
denote three clustering result comparisons by the metric in 
Table H] We have tested four numbers of factors: K = 100, 

* http;//linqs.cs.umd.edu/projects/projects/lbc/ 


K = 300, K = 500 and K = 1000. In each subfigure in 
Fig. we have compared the results from NNMF, CNMF, 
DNMF, TNMF, NMF and RTM on the clustering evaluation 
metric. Except for NMF, the algorithms all consider the 
effects from the horizontal network 772708x2708- From this 
result, we can see that NMF achieves the worst performance 
compared to others. Thus, we can draw the conclusion 
that incorporating the citation network is helpful for the 
clustering of the publications. Except for DNME, which has 
similar performance to RTM, NNME, CNME and TNME 
are better than RTM on this document clustering task. 
Notably, CNME and TNME achieve the best performance 
of all the algorithms. The reason is that the community 
structure of CNME is beneficial for the clustering because it 
encourages ‘similar’ nodes to cluster together. Eor example, 
two documents di and dj are in the same community in 
network 77 due to their ‘similarity’. Retaining the com¬ 
munity structure will make it more possible for di and dj 
to remain within the same cluster under the new factor 
representations. In the TNME, preserving max spanning 
tree encourages the most important relations of all the 
nodes/documents. These relations in the tree can be seen 
as the ‘bones’ of a network, which determine the weighted 
distances between the nodes (documents). Therefore, the 
TNME can benefit for the document clustering. It should 
be noted that although the network structure will influence 
the final clustering results, this influence will be under 
constraint from the document-word mapping network V. 
Each document will exhibit two natures from two networks: 
77 and V. If the two natures are consistent, the constraint 
from V will help to enhance the learned network structure 
from 77; If the two natures are not consistent, there will be 
a contradiction between 77 and V, which will prevent the 
network structure learning from 77. 

2) Recommendation with One-level Network: A public 
dataset is adopted, Lastfn^ which is a commonly used 
dataset for evaluating algorithms for recommender systems. 

^http;//labrosa.ee.columbia.edu/millionsong/lastfm 
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Fig. 6: Comparison of document clustering on different values of K. The influence of K values with a = 0.1.(if = 100, K — 300, 
K = 500, and K = 1000) 


In this dataset, a vertical network Vi892x 17632 is formed 
by users and artists, and Vij represents the count of user i 
listening artist j. The horizontal network iTi 892 xi 892 is the 
user friend network. The statistics are shown in Table |Vl] 
A certain number of user-artist pairs are retained as the test 
data. The task is to predict the counts of users listening to 
these artists. 


TABLE VI: Statistics of Lastjm dataset 


number of users 

1892 

number of artists 

17632 

number of friend relations 

12717 

test user-artist pairs 

5000 


The evaluation metric is Mean Absolute Error (MAE) 
which is the simplest and the most intuitive. The definition 

1 

MAE = — ^ ^ \ fu,i ~ (45) 

u,i 

where N is the number of test user-artist parts, is the 
real count and r„ ^ is the predicted count. The correlation 
coefficient is computed by 

- r){u - r) 

P = / - ■ (46) 

To show the performance of the proposed algorithms 
on the recommendation, we compared them with a state- 
of-the-art method: Graph regularized NME (GNME) | [3T| , 
The results are shown in Eig. The standard NME 
(without the horizontal network Tfi 892 xi 892 ) has the worst 
performance of all the algorithms, which is similar to the 
document clustering in Section V.B(l). We can therefore 
draw the conclusion that incorporating user friend net¬ 
work iTi 892 xi 892 improves the performance of the rec¬ 
ommendation. GNME has the same property as CNME 
for preserving the community structure of Tfi 892 xi 892 , as 
discussed in Section IV.B. The results in Eig. also show 


that GNME has similar performance to CNME. Although 
DNME has good performance on MAE, the correlation 
with DNME is the worst of all the proposed algorithms 
and the state-of-the-art GNME. This reflects that DNME 
predicts accurate values for some test data but not all data. 
The reason is that DNME preserves the network degrees 
of all the nodes/users. The nodes with relatively large 
degrees will tend to have large weights in the weighted 
degree distribution, and DNME will have a bias toward 
these nodes during factorization. Therefore, DNME tends 
to achieve good results for the nodes/users with big degrees. 
In all the algorithms, TNME achieves the best performance 
both on MAE and Correlation. As discussed in Section 
V.B(l), preserving max spanning tree encourages the most 
important relations (the ‘bones’ of a network) of all the 
nodes/users. These relations reflect not only the ‘distances’ 
between nodes but also the degrees of nodes. 

3) Document Clustering with Two-level Network: Eirst, 
we introduce the dataset we use. Documents are a collection 


of papers from Cite Seer |61|. There are 3312 papers in 
the whole corpus. Each paper is represented by a binary 
vector using words. The labels of these papers are set as 
their research areas, such as AI (Artificial Intelligence), 
ML (Machine Learning), Agents, DB (Database), IR (Infor¬ 
mation Retrieval) and HCI (Human-Computer Interaction). 
The statistics are shown in Table IVIII 


TABLE VII: Statistics of CiteSeer dataset 


Number of documents 

3,312 

Number of keywords 

3,703 

Number of classes 

6 


A two-level network is composed by two horizontal 
networks 1/3312x3312 and 1/3703x3703 and one vertical 
network V 3312 X 3703 J and they are constructed as following. 
The first level horizontal network 1/3312x3312 is a paper 
citation network, which is formed by the citation rela¬ 
tions between papers. The second level horizontal network 
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Correlation MAE 




Fig. 7: Comparison of recommendation on different values of K (K — 100, K = 300, K = 500, and K = 700) with a = 10. 


773703x3703 is the keyword concurrence network, which is 
formed by the concurrence relations between keywords. 
The vertical network 14312 x 3703 is the mapping relation 
between documents and keywords (there will be a link 
between a keyword and a document if this keyword shows 
in this document). 

It should be noted that we keep only one type of structure 
for two horizontal networks. For example, CNMF only 
keeps the community structures of both networks, hence 


Eq. (23 1 only adds a ■ ^\\Px — X\\p. For brevity, the 


coefhHent is used for both networks. 

Here, we compare the performances between the one- 
level network 773312 x 3312 and the two-level network 
773312x3312 and 773703 x 3703 - The comparisons on three 
evaluation metrics are shown in Fig. Except CNME 
with K — 300, two-level network outweighs the one- 
level network. It means that incorporating the keyword 
co-occurrent network can improve the document clustering 
task with only the document citation network. 

4) Recommendation with Two-level Network: The dat- 
set used for recommendation with two-level network is 
Delicious. This dataset is collected from the Delicious 
websit^ which records the bookmark options of users on 
webpages/URLs in this website according to users’ interests 
1^ . We filter this dataset by keeping the URLs that are 
marked by at least three users, and the number of URL is 
5633. The links between URLs are generated by their tags. 
In the Delicious website, each URL will get tags from users, 
and these tags will give an hint for the content/semantics 
of this URL. We use these tags to compute the content 
similarity between URLs with tag-vector representations 
through cosine similarity metric. The statistics of the dataset 
are shown in Table. |VIII| In this dataset, vertical network 
Ui 867 x 5633 is formed by users and URLs. The horizontal 
networks are: the user friend network 77i867xi867 and the 


^ http://www.delicious.coni 


content similarity network 775533 x 5633 - A certain number of 
user-URL pairs are kept to be the test data. The evaluation 
metrics are in Eq. ( [45] l and Eq. ( [46| . As shown in Eig.[^ the 
two-level network still outweighs the one-level network. 


TABLE VIII: Statistics of Delicious dataset 


number of users 

1867 

number of URLs 

5633 

test user-URL pairs 

2000 


Despite the good performances on both document clus¬ 
tering and recommendation tasks, we still want to give 
an advice that it should be very careful to combine the 
different networks: vertical network and different horizontal 
networks. Eor example, suppose there are a user-movie ver¬ 
tical network and a movie network for the recommendation 
task, and the movie network is formed by the distance 
between the releasing dates of the movies. We know that 
the releasing date of a movie will not highly impact on 
the rating from users. In this situation, the combination of 
two networks may not improve the rating prediction due to 
the ‘noise’ from the movie network. Since the information 
in the movie network will not help the prediction of the 
ratings, the constraint from movie network will decrease the 
learned information from the user-movie vertical network. 
However, if the formation of the movie network is replaced 
by another strategy: the movies with similar director and 
actors tend to have links, this new movie network will help 
to predict the user ratings on movies different from the old 
movie network. In this new situation, the combination of 
the movie network with user-movie network will help to 
improve the rating prediction. Therefore, the consistence 
between the different networks is important and should be 
considered before using the multilevel network model. 
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Fig. 8: Comparison of the influence from one-level network (with one horizontal network and one vertical network) and two-level 
network (with two horizontal networks and one vertical network) on document clustering with a = 0.1 and K = 100, K = 300, 
K = 500 and K = 700. 


VI. Conclusions and future study 

In this paper, we have introduced a general multi-level 
network model, and proposed four algorithms for multi¬ 
level network factorization with four different network 
structure constraints. The network structural constraints 
have been incorporated into the cost function of traditional 
nonnegative matrix factorization. The optimization of the 
new cost function has constructed a new latent space 
and projected all nodes in different levels into this new 
latent space. At the same time, the projected nodes will, 
as far as possible, retain the original network structure. 
Four algorithms with their convergence proofs have been 
carefully designed to hnd the optimal latent spaces. Finally, 
experiments on synthetic and real-world data show that 
our algorithms are able to preserve the network structures, 
and can be used for recommender systems and document 
clustering. 

There are still some interesting further study points. For 
example, the dimension number of the discovered latent 
space needs to be predefined in the present algorithms. 


The ability to automatically find an optimized number 
for the shared latent space will make multi-level network 
factorization more practical for real-world tasks. 
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